Spoken Language Recognition in the Latent Topic Simplex

نویسندگان

  • Kong-Aik Lee
  • Chang Huai You
  • Ville Hautamäki
  • Anthony Larcher
  • Haizhou Li
چکیده

This paper proposes the use of latent topic modeling for spoken language recognition, where a topic is defined as a discrete distribution over phone n-grams. The latent topics are trained in an unsupervised manner using the latent Dirichlet allocation (LDA) technique. Language recognition is then performed in a low dimensional simplex defined by the latent topics. We apply the Bhattacharyya measure to compute the ngram similarity in the topic simplex. Our study shows that some of the latent topics are language specific while others exhibit multilingual characteristic. Experiment conducted on the NIST 2007 language detection task shows that language cues can be sufficiently preserved in the topic simplex.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rapid Unsupervised Topic Adaptation – a Latent Semantic Approach

In open-domain language exploitation applications, a wide variety of topics with swift topic shifts has to be captured. Consequently, it is crucial to rapidly adapt all language components of a spoken language system. This thesis addresses unsupervised topic adaptation in both monolingual and crosslingual settings. For automatic speech recognition we rapidly adapt a language model on a source l...

متن کامل

Segmented Topic Model for Text Classification and Speech Recognition

This paper presents a new segmented topic model (STM) to explore the topic regularities and simultaneously partition the text or spoken documents into coherent segments. The topic model based on the latent Dirichlet allocation (LDA) is adopted to extract the topics and is strengthened by incorporating a Markov chain to detect the segments in a document. STM is trained according to a variational...

متن کامل

Approximate Inference for Domain Detection in Spoken Language Understanding

This paper presents a semi-latent topic model for semantic domain detection in spoken language understanding systems. We use labeled utterance information to capture latent topics, which directly correspond to semantic domains. Additionally, we introduce an ’informative prior’ for Bayesian inference that can simultaneously segment utterances of known domains into classes and divide them from ou...

متن کامل

PCR detection of thymidine kinase gen of latent herpes simplex Virus type 1 in mice trigeminal ganglia

  Herpes simplex virus type 1 establishes a latent infection in the peripheral nervous system following primary infection. During latent infection, virus genome exhibit limited transcription, with the HSV LATs consistently detected in latency infected ganaglia. Following ocular infection viral latency develops in the trigeminal ganglia. In this study PCR has been used for detection of HSV-1 nuc...

متن کامل

Vector-based spoken language recognition using output coding

The vector-based spoken language recognition approach converts a spoken utterance into a high dimensional vector, also known as a bag-of-sounds vector, that consists of n-gram statistics of acoustic units. Dimensionality reduction would better prepare the bag-of-sounds vectors for classifier design. We propose projecting the bag-of-sounds vectors onto a low dimensional SVM output coding space, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011